pacman::p_load(ggplot2, dplyr, tidyr, plotly, corrplot,readr,ggstatsplot,png)Take-home Exercise 4: Prototyping Modules for Visual Analytics Shiny Application
1. Project Overview
Our group project aims to solve the problem of the Singapore rental market by creating a visual application based on R Shiny. The core objective of the project is to accurately predict the future trend of housing rental prices in various regions in order to provide tenants with informed rental decision support.
In Take-home Exercise 4, our task was to prototype the visual application that our team had designed and select the appropriate Shiny UI components. This step is designed to ensure that our team project can be completed successfully, showcasing the functionality and features of the design through suitable Shiny UI components that make it easy and intuitive for users to interact with the application.
Through this process, we verify that the selected R package is supported, test the correctness of the R code, and determine the parameters and output exposed in the Shiny application. Our goal is to create a full-featured, user-friendly visualization tool to help users gain insight into rental trends in different parts of Singapore, thereby enhancing tenants’ decision-making skills.
2. Data Preparation
Our group obtained data on rental rates in Singapore from the Urban Redevelopment Authority. In order to obtain the required information, we conducted a search on the website and selected the data that met our needs. We then consolidate and merge this data and save it into a CSV file.
2.1 Install and Load packages
2.2 Import Data
Rental_data <- read_csv("data/ResidentialRental_Final.csv")head(Rental_data)# A tibble: 6 × 21
Column1 Year Project_Name Street_Name Postal_District Planning_Region
<dbl> <dbl> <chr> <chr> <dbl> <chr>
1 0 2021 1 LOFT LORONG 24 GEYLANG 14 East Region
2 0 2021 1 CANBERRA CANBERRA DRIVE 27 North Region
3 0 2021 HILLVIEW PARK HILLVIEW AVENUE 23 West Region
4 0 2021 STRATA ESSEX ROAD 11 Central Region
5 0 2021 EASTERN LAGOON UPPER EAST COAST… 15 East Region
6 0 2021 ONE JERVOIS JERVOIS CLOSE 10 Central Region
# ℹ 15 more variables: Property_Type <chr>, No_of_Bedroom <dbl>,
# Monthly_Rent_SGD <dbl>, Monthly_Rent_PSM <dbl>, Monthly_Rent_PSF <dbl>,
# Floor_Area_SQM_Avg <dbl>, Floor_Area_SQFT_Avg <dbl>,
# Lease_Commencement_Date <chr>, interest_rate <dbl>, nearest_mrt <chr>,
# distance_to_mrt <dbl>, nearest_school <chr>, distance_to_school <dbl>,
# latitude <dbl>, longitude <dbl>
3. Analysis
3.1 Exploratory Data Analysis (EDA)
Analysis of rental influencing factors: Exploratory data analysis is performed by exploring the relationship between Monthly_Rent_SGD and Floor_Area_SQFT_Avg.And I set Property_Type and Planning_Region as faceted variables.
3.1.1 Correlation between Floor_Area_SQM_Avg and Monthly_Rent_SGD
ggscatterstats(data = Rental_data,
x = Floor_Area_SQM_Avg, y = Monthly_Rent_SGD,
type = "nonparametric") +
facet_wrap(vars(!!sym("Property_Type"))) +
labs(x = "Floor_Area_SQM_Avg", y = "Monthly_Rent_SGD") +
theme_minimal()
ggscatterstats(data = Rental_data,
x = Floor_Area_SQM_Avg, y = Monthly_Rent_SGD,
type = "nonparametric") +
facet_wrap(vars(!!sym("Planning_Region"))) +
labs(x = "Floor_Area_SQM_Avg", y = "Monthly_Rent_SGD") +
theme_minimal()
Insights: Looking at the correlation between Floor_Area_SQM_Avg and Monthly_Rent_SGD, we found that regardless of the area or type of home, the general trend was that the larger the average size of the home, the higher the monthly rent.
3.1.2 Explore the relationship between No_of_Bedroom and Floor_Area_SQM_Avg
boxplot <- plot_ly(data = Rental_data,
x = ~No_of_Bedroom,
y = ~Floor_Area_SQM_Avg,
type = "box",
boxpoints = FALSE, # Outliers are not displayed
jitter = 0.3,
line = list(color = "black"))
median_values <- Rental_data %>%
group_by(No_of_Bedroom) %>%
summarise(median_value = median(Floor_Area_SQM_Avg, na.rm = TRUE))
boxplot <- boxplot %>%
add_lines(x = median_values$No_of_Bedroom,
y = median_values$median_value,
mode = "lines",
line = list(color = "red"),
name = "Median Line")
layout <- list(
xaxis = list(title = "No_of_Bedroom"),
yaxis = list(title = "Floor_Area_SQM_Avg")
)
interactive_boxplot <- boxplot %>% layout(layout)
interactive_boxplotInsights: From the trend of this box plot, we can see that usually the more rooms, the larger the area of the house.
3.2 Confirmatory Data Analysis (CDA)
3.2.1 Verification 1 - Housing type significantly affects monthly rent.
H0: There is no significant difference in monthly rent by housing type.
H1: Monthly rent varies significantly depending on the type of house.
# In the linear model,Monthly_Rent_SGD is the response variable and Property_Type is the predictor variable
lm_model <- lm(Monthly_Rent_SGD ~ Property_Type, data = Rental_data)
# Perform ANOVA tests
anova_result <- anova(lm_model)
print(anova_result)Analysis of Variance Table
Response: Monthly_Rent_SGD
Df Sum Sq Mean Sq F value Pr(>F)
Property_Type 4 4.0278e+11 1.0069e+11 11981 < 2.2e-16 ***
Residuals 192194 1.6153e+12 8.4048e+06
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if (anova_result$`Pr(>F)`[1] < 0.05) {
cat("Rejecting the null hypothesis, there is a significant difference.\n")
} else {
cat("Failing to reject the null hypothesis, monthly rents do not differ significantly between different housing types.\n")
}Rejecting the null hypothesis, there is a significant difference.
Insights: The conclusion “reject the null hypothesis, there is a significant difference” means that there is sufficient evidence that different housing types have a significant effect on monthly rents.
A visualization plot showing the distribution of monthly rent across different housing types
plot_ly(data = Rental_data,
x = ~Property_Type,
y = ~Monthly_Rent_SGD,
type = "violin",
box = list(visible = TRUE),
points = "none",
color = ~Property_Type) %>%
layout(xaxis = list(title = "Property Type"),
yaxis = list(title = "Monthly Rent SGD"))3.2.2 Verification 2 - Planning area has a significant impact on monthly rent
H0: There is no significant difference in monthly rents by planning area.
H1: Monthly rent varies significantly depending on the planned area.
# In the linear model, Monthly_Rent_SGD is the response variable and Planning_Region is the predictor variable
lm_model_region <- lm(Monthly_Rent_SGD ~ Planning_Region, data = Rental_data)
anova_result_region <- anova(lm_model_region)
print(anova_result_region)Analysis of Variance Table
Response: Monthly_Rent_SGD
Df Sum Sq Mean Sq F value Pr(>F)
Planning_Region 4 1.9013e+11 4.7534e+10 4997.7 < 2.2e-16 ***
Residuals 192194 1.8280e+12 9.5112e+06
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if (anova_result_region$`Pr(>F)`[1] < 0.05) {
cat("Rejecting the null hypothesis, there is a significant difference in monthly rents between different planning regions.\n")
} else {
cat("Failing to reject the null hypothesis, monthly rents do not differ significantly between different planning regions.\n")
}Rejecting the null hypothesis, there is a significant difference in monthly rents between different planning regions.
Insights: We reject the null hypothesis that monthly rents vary significantly across planning areas.
A visualization plot showing the distribution of monthly rent across different planning region
grouped_bar_chart <- ggplot(Rental_data, aes(x = Planning_Region, y = Monthly_Rent_SGD, fill = Planning_Region)) +
geom_bar(stat = "summary", fun = "mean") +
labs(x = "Planning Region", y = "Mean Monthly Rent SGD") +
theme_minimal()
interactive_grouped_bar_chart <- ggplotly(grouped_bar_chart)
interactive_grouped_bar_chart3.2.3 Verification 3 - The number of bedrooms has a significant effect on the size of the house
H0: The number of bedrooms has no significant effect on the size of the house.
H1: The number of bedrooms has a significant effect on the size of the house.
# In the linear model, Floor_Area_SQM_Avg is the response variable and No_of_Bedroom is the predictor variable
lm_model <- lm(Floor_Area_SQM_Avg ~ No_of_Bedroom, data = Rental_data)
anova_result <- anova(lm_model)
print(anova_result)Analysis of Variance Table
Response: Floor_Area_SQM_Avg
Df Sum Sq Mean Sq F value Pr(>F)
No_of_Bedroom 1 288590572 288590572 221623 < 2.2e-16 ***
Residuals 164673 214431948 1302
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if (anova_result$`Pr(>F)`[1] < 0.05) {
cat("Rejecting the null hypothesis, there is a significant difference.\n")
} else {
cat("Failing to reject the null hypothesis, bedroom count does not significantly affect floor area.\n")
}Rejecting the null hypothesis, there is a significant difference.
Insights: We reject the null hypothesis, which means that the number of bedrooms has a significant effect on the size of the house, or that there is a significant linear relationship between the number of bedrooms and the size of the house.
A visualization plot showing the distribution of the number of bedrooms and house area
filtered_data <- Rental_data %>% na.omit(c("No_of_Bedroom", "Floor_Area_SQM_Avg", "Property_Type"))
lm_model <- lm(Floor_Area_SQM_Avg ~ No_of_Bedroom, data = filtered_data)
predictions <- predict(lm_model, filtered_data)
plot_ly(data = filtered_data,
x = ~No_of_Bedroom,
y = ~Floor_Area_SQM_Avg,
type = "scatter",
mode = "markers",
marker = list(color = ~Property_Type),
text = ~paste("Property Type: ", Property_Type, "<br>No of Bedroom: ", No_of_Bedroom, "<br>Floor Area SQM Avg: ", Floor_Area_SQM_Avg)) %>%
add_lines(x = ~No_of_Bedroom,
y = ~predictions,
type = "scatter",
mode = "lines",
line = list(color = "red")) %>%
layout(xaxis = list(title = "No of Bedroom"),
yaxis = list(title = "Floor Area SQM Avg"))